PM566 Final

Author

Nicole Tang

PM 566 Final

Data Analysis for Covid 19 Vaccine Hesitancy and possible demographic and geographic correlations

Introduction

COVID-19 vaccine hesitancy refers to the reluctance or refusal to get vaccinated despite the availability of vaccines. Vaccination plays a crucial role in controlling the pandemic by reducing the spread of the virus, preventing severe illness, and decreasing hospitalization and death rates. The COVID-19 vaccines have been proven to be highly effective in boosting immunity and protecting not only individuals but also communities by contributing to herd immunity. However, hesitancy has been influenced by factors such as misinformation, distrust in healthcare systems or government authorities, concerns about the speed of vaccine development, and fears about potential side effects. Social, cultural, and political contexts have also shaped people’s attitudes toward vaccines. Addressing vaccine hesitancy requires comprehensive public health strategies that include transparent communication, community engagement, and efforts to build trust by addressing the specific concerns and barriers faced by different populations.

The CDC has published a data set about vaccine Hesitancy for COVID-19 in 2021. This data set has various demographic information showing information by county, state, ethnicity, and vulnerability. Hesitancy is measured by percentage of the population. This data set also looks into varying levels of hesitancy: hesitant, hesitant or unsure, or strongly hesitant. Data set origin: https://data.cdc.gov/Vaccinations/Vaccine-Hesitancy-for-COVID-19-County-and-local-es/q9mh-h2tw/about_data

My objective is to observe any possible correlations between demographic and geological factors and the rates of vaccine hesitancy.

Research Question Are there any correlations between demographic and geological factors and the rates of vaccine hesitancy?

Methods Data Cleaning and Wrangling A csv file downloaded to my files from the CDC website was read into a data frame. Rows with NA were removed. One of the data columns held latitude and longitudinal information in the data type “Point”. So, I coded two new variable columns for latitude and longitude so it is in a more usable form for future visualizations.

Aggregate Hesitancy Rates by State I created a new data frame with State as the primary key and get the mean hesitancy. I created a new data frame with State as the primary key and get the mean not hesitant. The mean not hesitant was calculated by subtracting hesitant, hesitant or unsure, and strongly hesitant from 100. Join these two dfs to have a df with both hesitancy variables. Make another df with state as primary key and add lat and long variables. Merge with the hesitancy table to have a table with pk: state and variables: estimated hesitant, not hesitant, lat, long. The summary statistics are tabulated as follows.

Summary Statistics for Vaccine Hesitancy
Mean Hesitant (%) SD Hesitant Min Hesitant (%) Max Hesitant (%) Mean Not Hesitant (%) SD Not Hesitant Min Not Hesitant (%) Max Not Hesitant (%)
12.37017 5.2458 4.026429 25.13857 61.69957 15.01524 27.73661 85.12571

Aggregate Hesitancy Rates by Ethnicity The data set has columns for each ethnicity and the percentage of that ethnicity in the region. I made a new cat column that’s value is the predominant ethnicity of that location. I then averaged the estimated hesitancy grouping by ethnicity.

Visualization Results

Visualization: Mean Hesitancy Rates by State Create a bar chart to observe which states have the highest rates of hesitancy. MT has the highest rates of hesitancy. It is followed by WY and AK who have similar rates of hesitancy. VT has the lowest rates of hesitancy. Based on this visualization and the difference from max to min of estimate rates I would conclude that hesitancy rates and states have a correlation.

Visualization: Mean Hesitancy Rates and Non hesitancy rates by State Create a bar chart to observe which states have the highest rates of hesitancy and also displays their non hesitancy rates. MT has the highest rates of hesitancy and the lowest rates of non hesitancy. It is followed by WY and AK who have similar rates of hesitancy.However, AK has higher rates of non hesitancy than WY. VT has the lowest rates of hesitancy and MA has the second highest rates of hesitancy. However MA has the highest rates of non hesitancy. Based on this visualization I would conclude that hesitancy and non hesitancy appear to be inversely related and the higher the rate of hesitancy, the lower the rate of non hesitancy. I would also conclude that the variance of mean hesitancy rates and non hesitancy rates differ by states indication a correlation between state and hesitancy rates.

Visualization: Mean Hesitancy Rates by Social Vulnerability Index I also wanted to look at other demographic points other than state/region. Social Vulnerability Index (SVI) was categorized as Very Low (0.0-0.19), Low (0.20-0.39); Moderate (0.40-0.59); High (0.60-0.79); Very High (0.80-1.0). I made a box plot so you can also see max, min, and median by category. High vulnerability has the highest average estimated hesitancy. Very low vulnerability has the lowest rates of hesitancy. This is interesting because you would think that the higher vulnerability would not be quite so hesitant.

Visualization: Mean Hesitancy Rates by CVAC I also wanted to look at other demographic points other than state/region. Social Vulnerability Index (SVI) was categorized as Very Low (0.0-0.19), Low (0.20-0.39); Moderate (0.40-0.59); High (0.60-0.79); Very High (0.80-1.0). I made a box plot so you can also see max, min, and median by category. High vulnerability has the highest average estimated hesitancy. Very low vulnerability has the lowest rates of hesitancy. This is interesting because you would think that the higher vulnerability would not be quite so hesitant.

Visualization: Mean Hesitancy Rates by Ethnicity Using the df with the average hesitancy rates by ethnicity I made a bar plot to look at which ethnic group had the highest average. Based on this visualization Percent non-Hispanic American Indian/Alaska Native had the highest rates of hesitancy and Percent non-Hispanic Native Hawaiian/Pacific Islander had the lowest rates of hesitancy.

Visualization: Mean Hesitancy Rates by Ethnicity I also wanted to look at hesitancy rates by ethnicity in a scatter plot to see each individual dot and then compare across ethnicity. Some groups like non-Hispanic Asians appear to have lower overall hesitancy, while groups such as non-Hispanic Black and non-Hispanic American Indian/Alaska Native show a wider spread and higher average hesitancy.

Conclusion The different variables we looked into when looking into various variables and there possible relationship with vaccine hesitancy rates are geographical location, social vulnerability index, ethnicity, and vaccination level.

Geography: Based on these visualizations and the difference from max to min of estimate rates I would conclude that hesitancy rates and states have a correlation. I would also conclude that hesitancy and non hesitancy appear to be inversely related and the higher the rate of hesitancy, the lower the rate of non hesitancy.

Social Vulnerability Index (SVI): High vulnerability has the highest average estimated hesitancy. Very low vulnerability has the lowest rates of hesitancy.

Ethnicity: Based on this visualization Percent non-Hispanic American Indian/Alaska Native had the highest rates of hesitancy and Percent non-Hispanic Native Hawaiian/Pacific Islander had the lowest rates of hesitancy.

Vaccination Level: The graph suggests that there is no correlation between hesitancy rates and vaccination level.

Based on these outputs intervention can be implemented to target various states and ethnicities to promote vaccinations. I would also suggest focusing on highly vulnerable populations. However, the data suggests that there is no need to target areas with low levels of vaccinations.